212 research outputs found
Semantic Embedding Space for Zero-Shot Action Recognition
The number of categories for action recognition is growing rapidly. It is
thus becoming increasingly hard to collect sufficient training data to learn
conventional models for each category. This issue may be ameliorated by the
increasingly popular 'zero-shot learning' (ZSL) paradigm. In this framework a
mapping is constructed between visual features and a human interpretable
semantic description of each category, allowing categories to be recognised in
the absence of any training data. Existing ZSL studies focus primarily on image
data, and attribute-based semantic representations. In this paper, we address
zero-shot recognition in contemporary video action recognition tasks, using
semantic word vector space as the common space to embed videos and category
labels. This is more challenging because the mapping between the semantic space
and space-time features of videos containing complex actions is more complex
and harder to learn. We demonstrate that a simple self-training and data
augmentation strategy can significantly improve the efficacy of this mapping.
Experiments on human action datasets including HMDB51 and UCF101 demonstrate
that our approach achieves the state-of-the-art zero-shot action recognition
performance.Comment: 5 page
Deep Multi-task Representation Learning: A Tensor Factorisation Approach
Most contemporary multi-task learning methods assume linear models. This
setting is considered shallow in the era of deep learning. In this paper, we
present a new deep multi-task representation learning framework that learns
cross-task sharing structure at every layer in a deep network. Our approach is
based on generalising the matrix factorisation techniques explicitly or
implicitly used by many conventional MTL algorithms to tensor factorisation, to
realise automatic learning of end-to-end knowledge sharing in deep networks.
This is in contrast to existing deep learning approaches that need a
user-defined multi-task sharing strategy. Our approach applies to both
homogeneous and heterogeneous MTL. Experiments demonstrate the efficacy of our
deep multi-task representation learning in terms of both higher accuracy and
fewer design choices.Comment: 9 pages, Accepted to ICLR 2017 Conference Track. This is a conference
version of the paper. For the multi-domain learning part (not in this
version), please refer to https://arxiv.org/pdf/1605.06391v1.pd
Zero-Shot Domain Adaptation via Kernel Regression on the Grassmannian
Most visual recognition methods implicitly assume the data distribution
remains unchanged from training to testing. However, in practice domain shift
often exists, where real-world factors such as lighting and sensor type change
between train and test, and classifiers do not generalise from source to target
domains. It is impractical to train separate models for all possible situations
because collecting and labelling the data is expensive. Domain adaptation
algorithms aim to ameliorate domain shift, allowing a model trained on a source
to perform well on a different target domain. However, even for the setting of
unsupervised domain adaptation, where the target domain is unlabelled,
collecting data for every possible target domain is still costly. In this
paper, we propose a new domain adaptation method that has no need to access
either data or labels of the target domain when it can be described by a
parametrised vector and there exits several related source domains within the
same parametric space. It greatly reduces the burden of data collection and
annotation, and our experiments show some promising results.Comment: Accepted to BMVC 2015 Workshop on Differential Geometry in Computer
Vision (DIFF-CV
BayesDLL: Bayesian Deep Learning Library
We release a new Bayesian neural network library for PyTorch for large-scale
deep networks. Our library implements mainstream approximate Bayesian inference
algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and
Laplace approximation. The main differences from other existing Bayesian neural
network libraries are as follows: 1) Our library can deal with very large-scale
deep networks including Vision Transformers (ViTs). 2) We need virtually zero
code modifications for users (e.g., the backbone network definition codes do
not neet to be modified at all). 3) Our library also allows the pre-trained
model weights to serve as a prior mean, which is very useful for performing
Bayesian inference with the large-scale foundation models like ViTs that are
hard to optimise from scratch with the downstream data alone. Our code is
publicly available at: \url{https://github.com/SamsungLabs/BayesDLL}\footnote{A
mirror repository is also available at:
\url{https://github.com/minyoungkim21/BayesDLL}.}
A Unified Perspective on Multi-Domain and Multi-Task Learning
In this paper, we provide a new neural-network based perspective on
multi-task learning (MTL) and multi-domain learning (MDL). By introducing the
concept of a semantic descriptor, this framework unifies MDL and MTL as well as
encompassing various classic and recent MTL/MDL algorithms by interpreting them
as different ways of constructing semantic descriptors. Our interpretation
provides an alternative pipeline for zero-shot learning (ZSL), where a model
for a novel class can be constructed without training data. Moreover, it leads
to a new and practically relevant problem setting of zero-shot domain
adaptation (ZSDA), which is the analogous to ZSL but for novel domains: A model
for an unseen domain can be generated by its semantic descriptor. Experiments
across this range of problems demonstrate that our framework outperforms a
variety of alternatives.Comment: 9 pages, Accepted to ICLR 2015 Conference Trac
Trace Norm Regularised Deep Multi-Task Learning
We propose a framework for training multiple neural networks simultaneously.
The parameters from all models are regularised by the tensor trace norm, so
that each neural network is encouraged to reuse others' parameters if possible
-- this is the main motivation behind multi-task learning. In contrast to many
deep multi-task learning models, we do not predefine a parameter sharing
strategy by specifying which layers have tied parameters. Instead, our
framework considers sharing for all shareable layers, and the sharing strategy
is learned in a data-driven way.Comment: Submission to Workshop track - ICLR 201
- …